Finding Related Pages in the World Wide Web

نویسندگان

Jeffrey Dean

Monika Henzinger

چکیده

When using traditional search engines, users have to formulate queries to describe their information need. This paper discusses a di erent approach to web searching where the input to the search process is not a set of query terms, but instead is the URL of a page, and the output is a set of related web pages. A related web page is one that addresses the same topic as the original page. For example, www.washingtonpost.com is a page related to www.nytimes.com, since both are online newspapers. We describe two algorithms to identify related web pages. These algorithms use only the connectivity information in the web (i.e., the links between pages) and not the content of pages or usage information. We have implemented both algorithms and measured their runtime performance. To evaluate the e ectiveness of our algorithms, we performed a user study comparing our algorithms with Netscape's \What's Related" service [12]. Our study showed that the precision at 10 for our two algorithms are 73% better and 51% better than that of Netscape, despite the fact that Netscape uses both content and usage pattern information in addition to connectivity information.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prioritize the ordering of URL queue in Focused crawler

The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...

متن کامل

A Technique for Improving Web Mining using Enhanced Genetic Algorithm

World Wide Web is growing at a very fast pace and makes a lot of information available to the public. Search engines used conventional methods to retrieve information on the Web; however, the search results of these engines are still able to be refined and their accuracy is not high enough. One of the methods for web mining is evolutionary algorithms which search according to the user interests...

متن کامل

Expert Discovery: A web mining approach

Expert discovery is a quest in search of finding an answer to a question: “Who is the best expert of a specific subject in a particular domain within peculiar array of parameters?” Expert with domain knowledge in any field is crucial for consulting in industry, academia and scientific community. Aim of this study is to address the issues for expert-finding task in real-world community. Collabor...

متن کامل

Finding Related Web Pages Based on Connectivity Information from a Search Engine

[email protected] ABSTRACT This paper proposes a method for finding related Web pages based on connectivity information of hyperlinks. As claimed by Kumar, a complete bipartite graph of Web pages can be regarded as a Web community sharing a common interest. However, preparing Web snapshot data for the search of such communities is not an easy task since the Web is huge and is growing. In our me...

متن کامل

Finding Related Hubs and Authorities

HubFinder is intended to be an algorithm for finding hubs related to an initial base set of pages. We define related as accessible via the link structure of the Web (following either in-going or out-going links). When searching for related hubs, one would probably need to apply the Kelinberg extension several times, because if one starts from a poor hub/authorithy, the representative pages migh...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

Computer Networks

دوره 31 شماره

صفحات -

تاریخ انتشار 1999

Finding Related Pages in the World Wide Web

نویسندگان

چکیده

منابع مشابه

Prioritize the ordering of URL queue in Focused crawler

A Technique for Improving Web Mining using Enhanced Genetic Algorithm

Expert Discovery: A web mining approach

Finding Related Web Pages Based on Connectivity Information from a Search Engine

Finding Related Hubs and Authorities

عنوان ژورنال:

اشتراک گذاری